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ABSTRACT 

The  3D  structure  and  motion  of  an  object  is  determined  from  its  optical  flow  under 
orthographic  projection.  First,  the  image  domain  is  divided  into  planar  or  almost  planar 
regions  by  checking  the  flow.  For  each  region,  parameters  of  the  flow  are  determined. 
Transformation  rules  under  coordinate  changes  and  hydrodynamic  analogies  are  also  dis¬ 
cussed.  The  3D  structure  and  motion  are  determined  in  explicit  forms  in  terms  of  ir¬ 
reducible  parameters  deduced  from  group  representation  theory.  The  solution  is  not 
unique,  containing  an  indeterminate  scale  factor  and  comprising  true  and  spurious  solu¬ 
tions.  Their  geometrical  interpretations  are  also  studied.  The  spurious  solution  disap¬ 
pears  if  two  or  more  regions  of  the  object  are  observed. 
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1.  INTRODUCTION 


Determination  of  the  3D  structure  and  motion  of  an  object  from  its 

projected  2D  images  is  one  of  the  most  important  tasks  of  computer  vision. 

This  can  be  done  by  introducing  various  heuristic  or  a  priori 

"constraints"  such  as  planarity  or  smoothness  of  the  object  and  rigidity 
of  the  motion  coupled  with  various  other  sources  of  information  like 
texture,  shading,  etc.  (e.g.,  [1]).  Basically  there  are  two  approaches. 
One  is  first  to  seek  "correspondence"  of  points,  i.e.,  knowledge  of  which 
point  moves  to  which  one,  between  two  sequential  images,  resulting  in  a 
so-called  'optical  flow,"  and  various  techniques  have  been  tried  to  detect 
the  optical  flow  (e.g.,  see  [2  -  6]).  The  other  approach  does  not  use  the 
point  correspondence  or  the  optical  flow  but  directly  measures  some  sorts 
of  'features"  of  the  image.  The  3D  motion  is  detected  from  these  features 
and  their  time  changes  alone  if  a  particular  model  is  assumed  for  the 
object.  For  example,  the  3D  motion  of  a  planar  surface  can  be  detected 
from  statistical  properties  of  its  texture  [7]  or  its  contour  [8,  9]. 

In  this  paper,  we  take  the  first  approach  and  assume  that  an  optical 
flow  is  already  obtained.  There  have  been  many  studies  of  schemes 
computing  from  a  given  optical  flow  the  3D  motion  of  the  object  or  the 
observer  seeing  a  stationary  environment  ("egomotion"  from  the  "motion 
parallax"  [10]).  Studies  of  this  type  have  often  been  associated  with  the 
"computational  approach"  to  human  perception  [2,  11,  12].  Roughly 

speaking,  procedures  are  classified  into  two  groups;  "correspondence- 

based"  approachs  and  "flow-based"  ones.  The  correspondence-based  approach  ,r 
picks  up  several  correspondence  pairs  out  of  the  optical  flow,  and 
subsequent  computation  is  based  on  their  coordinates,  assuming  no  specific 
model  about  the  object  except  the  rigidity  of  the  motion.  Then,  the  3D 
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structure  and  motion  are  recovered  numerically,  e.g. ,  see  [13  -  15]  for 
orthographic  projection  and  [16  -  23]  for  central  projection.  On  the 
other  hand,  the  flow-based  one  assumes  a  certain  model  about  the  object  or 
the  scene  and  examines  the  flow  pattern,  extracting  some  features  like  the 
"focus  of  expansion"  or  "vanishing  point,"  taking  spatial  derivatives, 
estimating  parameters  by  global  model  fitting  and  sometimes  employing 
hydrodynamic  analogies,  e.g.,  see  [24  -  34]  for  central  projection. 

In  this  paper,  we  present  a  flow-based  approach  under  orthographic 
projection,  which  is  the  case  when  the  size  of  the  object  is  small 
compared  with  the  distance  from  the  viewer.  So  far,  no  attempts  have  been 
made  of  the  flow-based  approach  under  orthographic  projection.  Probably, 
it  has  been  believed  that  not  enough  clues  are  obtained  as  to  the  3D 
structure  and  motion  under  orthographic  projection,  since  no  looming  up  or 
shrinking  away  is  caused  by  motions.  It  is  true  that  much  information  is 


lost  by 

orthographic 

projection. 

For 

instance. 

we  cannot  tell 

the 

absolute 

distance  from 

the 

viewer 

and 

whether 

it  is  approaching 

or 

receding 

.  However,  it 

is 

possible 

to 

extract 

information  about 

the 

relative  depth,  its  translation  and  3D  rotation.  The  solution  is  not 
unique,  as  was  pointed  out  by  Sugihara  and  Sugie  [15].  Yet,  we  can 
describe  geometrical  relationships  among  those  indeterminate  solutions. 
Here,  we  try  to  extract  here  as  much  knowledge  as  is  possible  in 
"analytical  terms"  under  such  "imperfect  information."  This  is  very 
important  in  practice,  because  we  often  fail  to  observe  the  effects  of 
central  projection  because,  say, the  object  is  too  small  or  is  located  far 
away  or  the  focal  length  of  the  camera  is  not  small  enough.  In  these 
cases,  even  partial  knowledge  is  useful,  for  it  can  be  supplemented  by 
other  sources  of  information. 


We  first  divide  the  image  domain  into  regions  which  can  be  regarded  as 
being  planar  or  almost  planar.  A  criterion  for  it  is  discussed.  Then,  we 
pick  up  each  region  and  compute  its  surface  orientation  and  motion.  We 
take  a  Cartesian  xy-coordinate  system  on  the  image  plane  and  the  z-axis 
perpendicular  to  it.  However,  the  choice  of  the  xy-coordinate  system  is 
completely  arbitrary  in  principle.  Hence,  interpretations  based  on 
different  xy-coordinate  systems  must  coincide.  In  other  words, 
interpretations  of  3D  structure  and  motion  must  be  "invariant."  In  order 
to  make  our  scheme  invariant,  we  first  study  the  transformation  rules  when 
one  coordinate  system  is  replaced  by  another.  This  consideration  has  a 
practical  significance,  for  it  is  sometimes  convenient  to  take  different 
coordinate  systems  for  different  regions  of  the  same  object,  taking  the 
coordinate  origin  in  each  region,  for  instance.  Then,  quantities 
associated  with  different  regions  must  be  compared  after  appropriate 
transformations . 

We  invoke  group  representation  theory  [35  -  38]  to  extract 
"irreducible  parameters"  with  respect  to  coordinate  changes.  We  also  make 
use  of  "hydrodynamic  analogies,"  viewing  the  optical  flow  as  if  it  were  a 
flow  of  real  fluid.  Since  hydrodynamics  is  usually  described  in  invariant 
forms,  it  gives  a  clear  understanding  of  the  invariant  nature  of  our 
interpretation.  Then,  we  express  the  surface  orientation  and  the  motion 
in  an  "explicit"  form,  which  is  made  possible  by  the  use  of  the 
irreducible  parameters.  It  turns  out  that  there  are  two  types  of 
solutions  for  each  planar  region.  One  is  the  true  one  and  the  other  is  a 
"spurious"  one,  each  containing  one  indeterminate  scale  parameter.  We 
describe  the  geometrical  relationship  between  the  true  and  the  spurious 
solutions  in  an  invariant  manner.  We  also  show  that  the  spurious  solution 


disappears  if  two  or  more  different  regions  of  the  same  object  are 
observed . 

Our  flow-based  approach  to  reconstruct  the  3D  structure  and  motion  is  a 
generalization  of  the  correspondence-based  approach  of  Sugihara  and  Sugie 
[15],  which  can  be  viewed  as  a  special  case  of  ours.  In  order  to  apply 
their  correspondence-based  approach,  the  velocity  measurement  must  be 
accurate.  In  contrast,  the  flow-based  approach  extracts  global 
quantities,  which  are  in  general  less  sensitive  to  noise  and  possible 
misdetection  of  correspondence.  Thus,  our  flow-based  approach  bears  a 
practical  significance.  Moreover,  many  important  observations  such  as  the 
invariance,  transformation  rules,  hydrodynamic  analogies,  the  "spurious" 
solution  and  its  geometrical  interpretation  cannot  be  easily  realize^  from 
a  correspondence-based  approach  like  that  of  Sugihara  and  Sugie  [15]. 
Their  reasoning  is  also  insufficient  to  give  the  degree  of  indeterminacy. 
This  is  because  we  solve  the  problem  in  analytical  terms,  while  the  scheme 
of  Sugihara  and  Sugie  [15]  gives  solutions  only  numerically.  Another  big 
advantage  of  the  flow-based  approach  is  that  the  procedure  can  be  applied 
to  the  case  where  no  optical  flow  is  available  or  no  correspondence  of 
points  is  detected.  This  is  because  the  formulation  rests  on  the  "flow 
parameters"  extracted  from  the  optical  flow.  Sometimes,  these  parameters 
can  be  computed  directly  from  the  image  sequence  itself  without  detecting 
point  correspondence  (Kanatani  [7  -  9]). 

2.  IDENTIFICATION  OF  OPTICAL  FLOW 

Suppose  a  plane  is  moving  in  the  scene  and  we  are  looking  at  its  image 
on  the  xy-plane  orthographically  projected  along  the  z-axis.  Let  z  =  px  + 
qy  +  r  be  the  equation  of  the  plane.  The  orientation  of  the  plane  is 


specified  by  Che  two  parameters  p  and  q,  which  are  often  referred  to  as 

the  "gradients"  because  p  =  3z/3x  and  q  =  3z/3y.  We  take  (0,  0,  r) ,  the 

intersection  between  the  plane  and  the  z-axis,  as  a  reference  point  and 

assume  that  the  reference  point  is  translating  with  translation  velocity 

(a,  b,  c)  and  is  rotating  with  rotation  velocity  (wl,  w2,  w3) ,  i.e., 

2  2  2 

rotating  by  sqr((wl)  +  (w2)  +  (w3)  )  rad/sec  screwwise  around  an  axis 

along  (wl,  w2,  w3)  at  the  reference  point  (Fig.  1).  (Here,  sqr(.)  stands 
for  the  square  root.)  Since  the  absolute  depth  r  and  the  velocity  c  in 
the  z-direction  are  indiscernible,  our  goal  is  to  determine  the  gradients 
p  and  q,  the  translation  velocities  a  and  b  and  the  rotation  velocities 
wl,  w2  and  w3  by  observing  the  projected  image. 

If  a  plane  with  gradients  p,  q  is  moving  with  translation  velocities  a, 

b  and  with  rotation  velocities  wl,  w2,  w3  at  (0,  0,  r) ,  an  elementary 

calculation  shows  that  the  x-  and  y-components  of  the  velocity  of  a  point 
(x,  y,  z)  on  the  plane  are  given  by 

u(x,  y)  =  a  +  (pw2)x  +  (qw2  -  w3)y,  (2.1) 

v (x ,  y)  =  b  -  (pwl  -  w3)x  -  (qwl)y,  (2.2) 

respectively.  This  is  called  the  "optical  flow."  As  was  stated  before, 
we  assume  that  the  optical  flow  is  already  available  at  particular  feature 
points.  We  first  try  to  fit  the  following  form  to  the  observed  flow: 

u(x,  y)  =  a  +  Ax  +  By,  (2.3) 

v(x,  y)  =  b  +  Cx  +  Dy.  (2.4) 

Here,  we  call  parameters  a,  b,  A,  B.  C,  D  the  "flow  parameters."  The 
simplest  way  may  be  to  use  the  least  square  method  to  minimize 

M  =  ffl(a  +  Ax  +  By  -  u(x,  u))^  +  (b  +  Cx  +  Dy  -  v(x,  y))^]dxdy  .  (2.5) 

From  3M/3a  =  0,  3M/3b  =  0,  3M/3A  =  0,  3M/3B  =  0,  3M/3C  =  0,  3M/3D  =  0,  we 

obtain  the  flowing  set  of  equations  called  the  "normal  equations:" 


(// dxdy)a  +  (//x dxdy)A  +  (// ydxdy)B  =  // u(x,  y)dxdy, 

(|/dxdy)j  +  (//xdxdy)C  +  (//ydxdy)D  =  //v(x,  y)dxdy, 

(Jfxdxdy)a  +  (ffx  dxdy)A  +  (//x ydxdy)B  =  //xu(x,  y)dxdy, 

(// ydxdy)a  +  (//xydxdy)A  +  (// y2dxdy)3  =  // yu(x,  y)dxdy,  (2.6) 

(//xdx dy)b  +  (//x  dxdy)C  +  (//xydxdy)D  =  //xv(x,  y)dxdy, 

a 

(ffydxdy)b  +  (ffx ydxdy)C  +  (//y  dxdy)D  =  // yv(x,  y)dxdy. 

From  these,  we  can  generally  determine  estimates  of  the  flow  parameters  a, 

b.  A,  B,  C,  D.  In  particular,  if  the  domain  of  observation  is  symmetric 

with  respect  to  both  the  x-  and  the  y-axes  and  has  area  S,  the  estimates 
are  explicitly  given  by 

a  =  If u(x,  y)dxdy/S ,  b  =  //v(x,  y)dxdy/S 

A  =  //xu(x,  y)dxdy///x2dxdy,  B  =  // yu(x,  y)dxdy///y2dxc/,  (2.7) 

2  2 

C  =  JJxv(x,  y)dxdy/j’]’x  dxdy,  D  =  JJyv(x,  y)dxdy//Jy  dxdy. 

Of  course,  we  must  replace  the  integrals  by  appropriate  summations,  since 
the  velocity  is  observed  only  at  a  finite  number  of  points. 

For  the  estimates  a,  b.  A,  B,  C,  D,  the  "residual"  M  of  (2.5)  becomes 

M  =  II u2dxdy  -  ( (JJudxdy )a  +  (JJxudxdyjA  +  (JJyudxdy)B) 

+  ||v2 dxdy  -  ( (JJvdxdyJb  +  (J|xvdxdy)C  +  (|Jyvdxdy)D) .  (2.8) 

An  optical  flow  can  be  that  of  a  plane  motion  if  and  only  if  M  =  0.  From 
this,  we  obtain  a  "criterion  of  planarity,"  taking  account  of  possible 
errors.  Namely,  we  can  view  a  computed  optical  flow  as  that  of  plane 
motion  if  M  given  by  (2.8)  is  less  than  a  certain  threshold.  (Note  that 
the  optical  flow  is  uniquely  determined  if  velocities  are  given  at  at 
least  three  feature  points,  cf.  Section  8.)  This  suggests  the  following 
procedure.  Namely,  starting  from  three  or  more  feature  points  where  the 
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residual  is  very  small,  we  can  add  to  them  other  points  from  around  one  by 


one,  each  time  recomputing  the  flow  parameters  and  checking  the  residual  M, 


as  long  as  the  residual  M  does  not  exceed  a  prescribed  threshold.  Then, 


we  end  up  with  a  region  which  is  almost  planar.  The  procedure  is 


repeated  for  the  rest  of  the  regions ,  and  the  image  is  roughly  decomposed 


into  almost  planar  small  regions.  (Exact  boundaries  of  these  small 


regions  are  not  necessary.  They  are  reconstructed  by  the  procedures  of 


Section  7.) 


In  the  following,  we  assume  that  a  given  optical  flow  can  be  regarded 


as  that  of  plane  motion  and  the  flow  parameters  a,  b.  A,  B,  C,  D  are 


already  estimated.  Thus,  the  flow  parameters  a,  b.  A,  B,  C,  D  are  the 


only  available  data.  Hence,  any  information  must  be  expressed  in  terms  of 


a,  b.  A,  B,  C,  D.  We  also  use  the  matrix  form 


u  =  a  +  Ar, 


(2.9) 


where  a  =  (a,  b) ,  r  =  (x,  y)  and 


C  D 


(2.10) 


We  do  not  make  particular  distinction  between  a  column  vector  and  a  row 


vector  because  we  can  easily  tell  which  is  which. 


Note  that  we  need  not  necessarily  know  the  optical  flow  or  detect 


point  correspondence,  because  all  we  need  is  the  the  flow  parameters,  not 


the  flow  itself.  For  example,  if  we  use  the  methods  of  Kanatani  [7  -  9], 


the  flow  parameters  are  determined  directly  without  knowing 


correspondence . 


On  the  other  hand,  consider  an  extreme  case  where  each  almost-p lanar 


patch  consists  only  of  three  points.  This  amounts  to  polyhedral 


approximation  of  the  object.  Then,  our  flow-based  approach  is  equivalent 
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to  the  correspondence-based  approach  (cf.  Example  3  in  Section  8).  In 
other  words,  our  flow-based  approach  is  a  generalization  of  the 
correspondence-based  approach,  including  it  as  an  extreme  case. 

3.  COORDINATE  CHANGE  AND  INVARIANCE 

As  was  stated  in  the  previous  section,  all  information  we  get  is  the 
flow  parameters  a,  b.  A,  B,  C,  D  alone,  from  which  we  want  to  compute  the 
gradients  p,  q  and  the  motion  parameters  a,  b,  wl ,  w2,  w3.  However,  all 
of  these  parameters  are  defined  with  respect  to  a  given  Cartesian  xy- 
coordinate  system  on  the  image  plane,  and  hence,  if  we  use  another  xy- 
coordinate  system,  we  obtain  different  values.  Since  we  can  take  an 
arbitrary  Cartesian  coordinate  system  on  the  image  plane,  the 
interpretation  of  the  structure  and  motion  must  be  "invariant"  with 
respect  to  coordinate  systems.  To  be  precise,  structure  and  motion 
parameters  computed  from  different  values  of  the  flow  parameters  due  to 
coordinate  change  must  coincide  with  those  obtained  by  transforming  the 
original  interpretation  accordingly.  In  short,  interpretation  must 
"commute"  with  coordinate  change  (Fig.  2).  In  the  following,  we  consider 
a  group- theoretical  way  to  exploit  this  invariance.  Although  it  may 
seem  too  pompous  to  invoke  a  sophisticated  mathematics  for  a  very  simple 
problem  like  the  present  one,  this  consideration  provides  us  with  a 
transparent  viewpoint,  an  elegant  closed  formulation  and  also  a  strong 
guidance  to  cope  with  more  complicated  problems  like  images  under  central 
projection,  etc.  Besides,  it  is  sometimes  convenient  to  use  different 
coordinate  systems  for  different  regions.  Then,  we  need  the 
transformation  rules  for  coordinate  changes  in  order  to  combine  the 
results. 

Suppose  we  take  an  x ' y ' -coordinate  system  by  rotating  the  original 


xy-coordina te  system  by  angle  t  counterclockwise  and  by  translating  it  by 
(hi,  h2)  (Fig.  3).  The  origin  of  the  x ' y ' -coordinate  system  O'  is  at  (hi, 
h2)  of  the  xy-coordinate  system.  If  (x,  y)  are  the  coordinates  with 
respect  to  the  former  coordinate  system  and  (x',y')  with  respect  to  the 
latter,  their  relationship  is  given  by 

r'  =  R(r  -  h),  (3.1) 


where  r  =  (x,  y) ,  r'  =  (x* ,  y'),  h  =  (hi,  h2)  and 


R  = 


cos  t  sin  t 


(3.2) 


I  -  sin  t  cos  t  J 

The  coordinate  transformations  of  this  type,  which  we  denote  by  (R,  h) , 

form  a  group  known  as  the  2D  "Euclidean  transformation  group."  Consider 

another  transformation  r"  =  R'(r'  -  h’).  If  we  operate  (3.1)  followed  by 

this,  we  obtain  r"  =  R'(R(r  -  h)  -  h')  =  R'R(r  -  (R^h'  +  h)),  which  is 

transformation  (R'R,  RTh'  +  h),  where  T  designates  the  transpose.  (Note 

-1  T 

that  R  is  an  orthogonal  matrix  so  that  R  =  R  .)  In  other  words,  the 
group  operation  is  given  by 


(R’,  h 1 ) (R ,  h)  =  (R'R,  RTh’  +  h).  (3.3) 

If  u,  v  are  the  velocities  with  respect  to  the  original  coordinate 
system  and  u’,  v’  with  respect  to  the  new  one,  it  is  easy  to  see 

u'  =  Ru,  (3.4) 

where  u  =  (u,  v)  and  u'  =  (u1,  v').  In  other  words,  u  and  v  are 
transformed  "as  a  vector."  Next,  consider  the  gradients  p,  q.  Since  p  = 
Sz/3x  and  q  =  3z/3y,  they  must  be  transformed  as  a  vector,  i.e., 

p'  =  Rp.  (3.5) 

where  p  =  (p,  q)  and  p’  =  (p’,  q')-  Consider  the  rotation  velocities  wl, 
w2 ,  w3 .  Obviously,  w3  is  an  invariant  under  coordinate  changes  or  a 


"scalar," 


and  wl,  w2  are  transformed  as  a  vector,  i.e.. 


where  w  =  (wl,  w2)  and  w'  =  (wl1,  w2 ' ) ,  because  they  are  projections  of  a 

3D  vector.  The  fact  that  (p,  q)  and  (wl,  w2)  are  transformed  as  vectors 

implies  that  they  have  invariant  meanings.  In  fact,  (p,  q)  indicates  the 

direction  of  "maximum  gradient"  or  the  "steepest  ascent."  The  magnitude  of 

(p,  q)  is  the  incline  along  that  direction.  On  the  other  hand,  (wl,  w2) 

indicates  the  "axis  of  rotation"  in  the  xy-plane.  In  other  words,  if 

rotation  w3  around  the  z-axis  is  not  considered,  the  rotation  is  realized 

2  2 

by  rotating  the  plane  around  that  axis  by  sqr(wl  +  w2  )  (rad/sec) 
screwwise . 

On  the  other  hand,  a,  b  are  "not"  transformed  as  a  vector.  In  fact, 
the  rotation  velocities  wl,  w2,  w3  are  defined  at  the  reference  point  (0, 
0,  r),  and  this  rotation  induces  translation  velocity  (wl ,  w2,  w3)  *  (x, 

v,  z  -  r)  at  point  (x,  y,  z).  Since  the  new  reference  point  goes  to  (hi, 
h2,  r  +  phi  +  qh2) ,  the  translational  velocity  induced  there  is 


r  wi  i 

hi 

pw2 

qw2  -  w3  ] 

1  hi 

w2 

X 

h2 

-  pwl  +  w3 

-  qwl 

1  ^2  ' 

(3.7) 

w3  J 

ph.l  +  qh2  J 

w2 

-  wl 

Comparing  this  with  eqns  (2.1)  and  (2.2),  we  obtain  the  transformation 
rule 

a '  =  R(a  +  Ah) ,  (3.8) 

where  a '  =  (a ' ,  b  '  )  . 

Finally,  note  that  if  parameters  si,  s2  are  transformed  as  a  vector, 
the  complex  number  si  +  is2  is  transformed  with  "weight"  -  1,  i.e.,  by 
multiplication  of  e(-  t).  (Here,  i  is  the  imaginary  unit,  and  we  use 
abbreviation  e(.)  for  exp(i.).)  Hence,  if  we  put 

P  =  p  +  iq ,  W  =  wl  -  w2,  (3.9) 


we  have 


P'  =  e(-  t)P 


W'  =  e(-  t)W. 


(3.10) 


The  significance  of  this  consideration  will  become  clear  in  subsequent 
sections.  In  the  following,  we  regard  the  complex  numbers  P  and  W  as  also 
2D  vectors. 

4.  IRREDUCIBLE  PARAMETERS  AND  THEIR  INVARIANCE 

As  was  stated  earlier,  the  optical  flow  is  first  expressed  by  eqn 

(2.9) .  After  the  coordinate  change  (3.1),  it  becomes  u'  =  Ru  =  R(a  +  Ar) 

=  R(a  +  A(RTr'  +  h))  =  (Ra  +  RAh)  +  RAR^r'.  Comparing  this  with  eqn 

(2.9) ,  we  find  that  the  new  flow  parameters  a',  b'.  A',  B',  C',  D'  are 

given  by 

a'  =  Ra  +  RAh,  A'  =  RART,  (4.1) 

where  a'  =  (a',  b')  and  A'  is  the  matrix  of  A',  B',  C',  D'  like  eqn 

(2.10) .  Here,  we  again  obtain  eqn  (3.8),  the  transformation  of  a,  b. 
We  can  also  see  that  A  is  transformed  "as  a  tensor." 

Apparently,  the  transformation  of  the  flow  parameters  from  a ,  b.  A,  B, 

C,  D  to  a’,  b',  A',  B',  C',  D'  is  a  linear  transformation,  which  we  denote 

by  rep[(R,  h) ] .  This  gives  a  six  dimensional  "representation"  of  the  2D 

Euclidean  group.  Indeed,  if  we  transform  a',  b'.  A',  B',  C',  D'  into  a", 

b".  A",  B",  C",  D"  by  transformation  (R',  h')  we  get  a"  =  R'a'  +  R'A'h'  = 

R ' (Ra  +  RAh)  +  R,(RART)h’  =  (R'R)a  +  (R'R)A(RTh'  +  h)  and  A"  =  R'A'R'1  = 

R'RARTR'T  =  (R'R)A(R'R)^.  In  view  of  eqns  (4.1),  this  composite 

T 

transformation  is  rep[(R'R,  R  h'  +  h)],  which  is  equal  to  rep[(R',  h')(R, 
h) ]  by  eqn  (3.3).  Hence, 

rep [ (R ' ,  h')]rep[(R,  h)]  =  rep[(R',  h')(R,  h) ] ,  (4.2) 

and  repf(R,  h) ]  is  really  a  representation. 

From  eqns  (4.1),  we  find  that  this  representation  is  "reducible"  into  a 
and  A  because  parameters  A  are  transformed  among  themselves.  (However, 
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this  representation  is  not  "completely  reducible,"  since  the  remaining 


parameters  a  are  not  tranformed  among  themselves.)  In  other  words,  A 


induces  a  four  dimensional  representation  of  the  2D  rotation  group.  This 


representation  is  completely  reducible  because  the  rotation  group  is 


topologically  a  "compact"  group.  It  is  also  seen  that  it  is  decomposed 


into  "irreducible  representations,"  all  of  which  are  one-dimensional. 


i.e.,  multiplication  of  e(nt)  with  integer  n  ("weight").  This  is  a 


consequence  of  Schur's  lemma  and  the  fact  that  the  2D  rotation  group  is  a 


"commutative"  group  (cf.  [35,  36]) 


As  is  well  known,  the  decomposed  irreducible  representations  are  given 


by  calculating  the  "character"  of  the  representation.  Let  us  compute  the 


’trace"  of  the  transformation  A'  =  RAR  .  If  A  =  1 ,  B  =  C  =  D  =  0 ,  then  A' 


cos^t,  which  is  the  first  diagonal  element.  Likewise,  we  can  see  that 


2  2 

all  the  diagonal  elements  are  cos  t.  Hence,  the  character  is  Acos  t  =  2  + 


2cos2t  =  2  +  e(2t)  +  e(-  2t).  Thus,  there  are  two  "absolute  invariants" 


(weight  0)  and  two  "relative  invariants"  of  weight  2  and  -  2.  The  last 


two  are  mutual  complex  conjugate  because  the  representation  space  is  real. 


The  process  of  decomposition  is  given  by  Weyl's  principle  of  exploiting 


the  symmetry  of  tensor  A  (cf.  [37,  38]).  It  is  decomposed  into 


symmetric  and  antisymmetric  parts  as 


A  (B+C) / 2 


(B+C)/2  D 


0  -  (C-B) / 2 


(C-B) / 2 


(4.3) 


This  decomposition  is  "invariant,"  i.e.,  independent  of  the  coordinate 


change.  Hence,  A,  B+C,  D  are  transformed  among  themselves,  and  R  =  C 


B  is  an  absolute  invariant.  The  symmetric  part  is  further  decomposed  into 


the  "scalar"  and  "deviator"  parts  ar- 


a  c  .*  .*  .■  - 


A  (B+CJ/2 1  .  ,f  1  O']  T(A-D)/2  (B+C)/2‘ 

-■4-j-2  +  •  <*•*) 

(B+C) / 2  D  j  [  0  1.  (B+C)/2  -  (A-D)/2  _ 

which  is  also  invariant,  and  T  =  A  +  D  is  an  absolute  invariant.  The  set 
of  A  -  D,  B  +  C  is  transformed  by 

A'  -  D’  cos2t  sin2t  A  -  D 

(4.5) 

B'  +  C'  _  [  -  sin2t  cos2t  J  |_  B  +  C 

The  decomposition  so  far  corresponds  to  the  expression  2  +  2cos2t  of  the 
character.  If  we  consider  complex  parameter  S  =  (A  -  D)  +  i(B  +  C) ,  it  is 
transformed  with  weight  -  2,  i.e..  S'  =  e(-  2t)S.  We  call  the  three 
invariant  parameters 

T  =  A  +  D,  R  =  C  -  B,  S  =  (A  -  D)  +  i(B  +  C),  (4.6) 

the  "irreducible  parameters"  of  the  optical  flow. 

Lastly,  let  us  consider  "hydrodynamic  analogies."  If  the  optical  flow 
(4.1)  is  regarded  as  the  flow  of  real  fluid,  each  of  the  above  invariants 
has  a  physical  meaning.  For  example,  T  =  3u/3x  +  3v/3y  is  the 

’divergence,"  and  the  first  term  of  the  right-hand  side  of  eqn  (4.4) 
describes  a  flow  like  Fig.  4.  Similarly,  R  =  3v/3x  -  3u/3y  is  the 

"rotation"  or  "vorticity"  of  the  flow,  and  the  second  term  of  the 
right-hand  side  of  eqn  (4.3)  describes  the  flow  of  Fig.  5  (cf.  [7,  25, 

26]).  The  second  term  of  the  right-hand  side  of  eqn  (4.4)  describes  a 
"pure  shear  flow"  like  Fig.  6.  Consider  the  polar  representation 
abs(S)e(arg(S))  of  S.  Since  S  rotates  by  2t  clockwise  around  the  origin 
on  the  complex  plane  when  the  xy-coordinate  system  is  rotated  by  t 
counterclockwise,  Q1  =  e(arg(S)/2)  and  Q2  =  iQl  both  rotate  by  t 

clockwise.  This  means  that  they  are  transformed  "as  vectors"  with  weight 
-  1,  and  hence  their  orientations  have  invariant  meanings.  As  a  matter  of 
fact,  Q1  and  Q2  represent  the  directions  of  "maximum  extension"  and 
"maximum  compression,"  respectively,  in  hydrodynamics.  This  becomes  clear 


if  we  rotate  the  coordinate  system  by  arg(S)/2  counterclockwise  so  that  Q1 
and  Q2  coincide  with  the  x-  and  the  y-axis,  respectively.  Then,  the  last 
term  on  the  right-hand  side  of  eqn  (4.4),  which  describes  the  "pure  shear 
flow,"  is  diagonalized  as 

"  1  0  ' 

abs(S)  /2,  (4.7) 
_  0  -  i  _ 

which  is  the  "canonical  form"  of  the  pure  shear  flow  (Fig.  7).  The 
orientations  of  Q1  and  Q2  are  called  the  "principal  axes”  of  the  flow. 
The  magnitude  abs(S)  =  sqr((A  -  D)  +  (B  +  C)  )  is  an  absolute  invariant 
and  is  called  the  "shear  strength"  in  hydrodynamics. 


NOTE.  An  easiest  way  to  see  that  quantities  of  eqns  (4.6)  are  really 

invariants  is  to  consider  the  "infinitesimal  transformations."  Let  d 

T 

denote  differentiation  with  respect  to  t  at  t  =  0.  From  A’  =  RAR  .  we 
immediately  obtain  dA  =  (dR)A  -  A(dR),  where 


Hence,  we  obtain 

dA  =  B  +  C,  dB  =  -  A  +  D,  dC  =  -  A  +  D,  dD  =  -  E  -  C,  (4.9) 

from  which  results 

dT  =0,  dR  =  0,  dS  =  -  2idS .  (4.10) 

Thus,  we  can  confirm  that  T  and  R  are  really  absolute  invariants  while  S 
is  a  relative  invariant  of  weight  -  2. 

5.  DETERMINATION  OF  THE  ROTATION  AROUND  THE  z-AXIS 

Now  that  we  have  prepared  necessary  mathematical  preliminaries,  we 
proceed  to  determining  the  surface  and  motion  parameters.  Since  a,  b  are 
directly  obtained,  we  only  have  to  compute  p,  q,  wl ,  w2,  w3  from  A,  B.  C, 


D.  Comparing  eqns  (2.1)  and  (2.2)  with  eqn  (2.9),  we  have  a  set  of 
equations  to  solve  as  follows: 

A  =  pw2,  B  =  qw2  -  w3,  C  =  -  pwl  +  w3,  D  =  -  qwl .  (5.1) 


In  terms  of  the  irreducible  parameters,  eqns  (5.1)  become 


T  =  pw2  -  qwl. 


R  =  2w3  -  pwl  -  qw2 , 


S  =  pw2  +  qwl  +  i(qw2  -  pwl) . 


(5.2) 

(5.3) 


Eqns  (5.2)  are  combined  together  if  we  consider  a  complex  number  R  +  iT, 


which  is  also  an  absolute  invariant.  We  get 

R  +  iT  =  2w3  -  pwl  -  qw2  +  i(pw2  -  qwl). 


(5.4) 


Hence,  the  given  equations  (5.1)  are  equivalent  to  eqns  (5.4)  and  (5.3). 


If  we  use  the  complex  forms  of  eqns  (3.9)  for  p,  q,  wl,  w2,  the  right  hand 
sides  of  eqns  (5.4)  and  (5.3)  become  2w3  -  PW*  and  -  iPW,  respectively, 
where  *  designates  the  complex  conjugate.  Hence,  solving  eqns  (5.1)  is 


equivalent  to  solving 


PW*  =  2w3  -  (R  +  iT), 


(5.5) 


PW  •=  iS, 


(5.6) 


with  P,  W  and  w3  as  unknowns.  Note  that  P  is  of  weight  -  1  and  W*  is  of 
weight  1,  so  that  PW*  is  of  weight  0  or  an  absolute  invariant.  Hence, 
both  sides  of  eqn  (5.5)  are  an  absolute  invariant  (weight  0).  Likewise, 
both  sides  of  eqn  (5.6)  are  of  weight  2. 

First,  consider  w3.  It  Is  determined  from  the  fact  that  the  left  hand 


sides  of  eqns  (5.5)  and  (5.6)  have  the  same  magnitude.  Hence,  we  get  (2w3 
-  (R  +  iT))(2w3  -  (R  -  iT))  =  SS*,  so  that  w3  is  given  as  a  root  of  the 


quadratic  equation 


X2  -  RX  +  (T2  +  R2  -  SS*)/4  =  0. 


(5.7) 


Since  this  is  of  an  absolutely  invariant  form  (note  SS*  is  an  absolute 
invariant),  the  solution  w3  is  a  scalar  as  is  expected.  Eqn  (5.7)  has  two 


(5.8) 


X  =  (R  ±  sqr (SS*  -  T2))/2. 

In  order  that  the  solutions  be  real,  the  discriminant  must  be  non¬ 
negative.  Namely, 

abs(T)  s  abs(S).  (5.9) 

In  terms  of  hydrodynamic  analogies: 

LEMMA  1.  The  magnitude  of  divergence  should  not  be  greater  than  the  shear 
strength. 


Here,  we  have  obtained  a  "criterion  of  rigidity."  Namely,  if  the  observed 
values  of  A,  B,  C,  D  do  not  satisfy  inequality  (5.9),  the  flow  cannot  be 
interpreted  as  that  of  rigid  plane  motion. 

If  inequality  (5.9)  is  satisfied,  eqn  (5.8)  gives  two  real  roots.  The 
two  solutions  are 


X  *  w3  and  w3  -  (pwl  +  qw2) .  (5.10) 

This  can  be  checked  by  substituting  eqns  (5.1)  in  eqn  (5.7).  We  get  X^ 
(2w3  -  pwl  -  qw2)X  +  w3(w3  -  (pwl  +  qw2))  =  0,  or  (X  -  w3)(X  -  w3  +  pwl  + 
qw2)  =  0.  Thus,  one  of  the  two  roots  gives  the  true  solution  while  the 
other  gives  a  "spurious  solution,"  and  we  cannot  tell  one  from  the  other 
for  a  given  optical  flow.  Then,  as  we  show  in  the  next  section,  p,  q,  wl , 
w2  are  determined  for  each  of  these  solutions,  resulting  in  two  types  of 
solutions,  the  "true"  and  the  "spurious"  one.  However,  the  spurious 
solution  disappears  if  two  plane  faces  of  the  same  object  are  observed. 
This  will  be  discussed  later.  Note  that  pwl  +  qw2  is  a  scalar  because  it 
is  the  inner  product  of  two  "vectors"  P  =  p  +  iq  and  W  =  wl  +  iw2.  Also 


note  that  we  do  not  have  the  spurious  solution  only  when  the  equality  of 
(5.10)  holds,  in  which  case  pwl  +  qw2  =  0,  or  P  =  p  +  iq  and  W  =  wl  +  iw2 
are  mutually  "orthogonal." 
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6.  DETERMINATION  OF  SURFACE  ORIENTATION  AND  ROTATION 

All  the  equations  to  be  solved  for  P  =  p  +  iq  and  W  =  wl  +  iw2  are  eqns 
(5.5)  and  (5.6).  It  is  immediately  observed  that  the  magnitude  of  W 
cannot  be  determined  uniquely,  since  W  multiplied  by  a  scale  factor 
together  with  P  divided  by  that  factor  also  satisfies  the  original 
equation.  Hence,  we  can  take  the  magnitude  k  =  abs(W)  as  an  indeterminate 
scale  factor.  Of  course,  we  could  take  wl  as  an  indeterminate  scale 
factor  and  express  the  rest  in  terms  of  wl ,  or  we  could  take  w2,  or  (wl  + 
w2)/2,  etc.  However,  these  quantities  are  not  invariants  and  hence  the 
interpretations  in  terms  of  them  do  not  have  invariant  meanings,  while  k 
is  an  invariant  and  hence  leads  to  invariant  geometrical  interpretations. 
Now,  take  the  ratio  of  eqn  (5.6)  to  eqn  (5.5).  We  get 

W/W*  =  IS / ( 2w3  -  (R  +  iT)).  (6.1) 

Similarly,  if  we  take  the  ratio  of  eqn  (5.6)  to  the  complex  conjugate  of 
eqn  (5.5),  we  get 

P/P*  =  iS/(2w3  -  (R  -  iT)).  (6.2) 

The  left  hand  sides  of  eqns  (6.1)  and  (6.2)  are  e(2arg(W))  and  e(2arg(P)), 
respectively.  From  eqn  (6.1),  we  conclude  that  2arg(W)  =  tt / 2  +  arg(S) 
arg(2w3  -  (R  +  iT)).  There  exist  two  values  for  arg(W)  mutually  opposite 
with  respect  to  the  origin.  However,  we  can  pick  up  one  of  them 


arbitrarily,  say 

arg(W)  =  tt / A  +  arg (S)  / 2  -  arg(2w3  -  (R  +  iT))/2,  (6.3) 

if  we  allow  the  scale  factor  k  to  be  negative.  This  does  not  lose  the 
uniqueness  of  the  expression  W  =  ke(arg(W)).  Thus,  we  have  completely 
determined  wl  and  w2,  since  the  scale  factor  k  is  an  essential 
indeterminate.  Namely, 

W  =  ke(Tr/A  +  arg(S)/2  -  arg(2w3  -  (R  +  iT))/2).  (6.4) 


Now  that  we  have  obtained  W,  the  remaining  P  is  determined  from  eqn 


(5.6)  by 

P  =  iS/W 

=  Se(ir/4  -  arg(S)/2  +  arg(2w3  -  (R  +  iT))/2)/k.  (6.5) 

Note  that  S  is  of  weight  -  2,  while  W  is  of  weight  -  1.  Hence,  P  is  of 
weight  -  1,  i.e.,  transformed  as  a  vector,  as  expected.  Thus,  we  obtain 

THEOREM  1. 

w3  =  (R  +  sqr (SS*  -  T2))/2,  (5.8) 


W  =  ke (it / 4  +  arg (S) / 2  -  arg(2w3  -  (R  +  iT))/2),  (6.4) 

P  =  Se(ir /4  -  arg(S)/2  +  arg(2w3  -  (R  +  iT))/2)/k,  (6.5) 

where  k  is  an  arbitrary  real  number. 

On  the  other  hand,  we  find,  from  eqn  (6.5),  that 

arg (P)  =  arg(S)  -  arg(W)  ±  n/2,  (6.6) 

where  the  double  sign  corresponds  to  the  sign  of  the  scale  factor  k. 
Therefore,  we  see  that 

(arg  (P)  +  arg(W))^2  =  arg(S)/2  +  tt/4.  (6.7) 

This  implies  a  simple  interpretation  in  hydrodynamic  analogies.  Recall 
that  arg(S)/2  is  the  direction  of  maximum  extension.  Hence,  the 

orientations  designated  by  eqn  (6.7)  bisect  the  angle  made  by  the 

directions  of  maximum  extension  and  maximum  compression.  These 
orientations  are  known  as  the  directions  of  "maximum  shearing,"  because 
the  viscosity  becomes  maximum  along  these  directions.  Thus,  we  conclude: 

COROLLARY  1.  The  orientations  of  P  =  p  +  iq  and  W  =  wl  +  iw2  are 
symmetric  with  respect  to  the  direction  of  maximum  shearing. 

This  statement,  of  course,  has  an  invariant  meaning  irrespective  of  the 
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choice  of  the  coordinate  system. 

So  far,  we  have  assumed  that  w3  is  the  true  rotation  velocity.  Suppose 
it  is  the  spurious  one  w3  -  (pwl  +  qw2) .  If  we  replace  w3  in  eqn  (6.1) 
by  w3  -  (pwl  +  qw2) ,  eqn  (6.1)  becomes 

W*/W  =  -  iS/(2w3  -  (R  -  iT)).  (6.8) 

Comparing  this  with  eqn  (6.2),  we  see  that  spurious  2arg(W)  is  opposite  to 
true  2arg(P).  In  other  words,  the  orientation  of  spurious  w  is  orthogonal 
to  that  of  true  p  and  we  cannot  say  any  more  about  its  orientation  because 
the  magnitude  k  of  w  is  indeterminate  including  the  signature.  If  we 
obtain  spurious  W  by  eqn  (6. A),  spurious  P  is  again  determined  by  eqn 
(6.5).  It  can  be  immediately  seen  that  the  orientation  of  spurious  p  is 
orthogonal  to  that  of  true  w,  and  the  above  observation  is  still  valid  for 
spurious  p  and  w.  At  the  same  time,  we  obeserve  the  following  with 
respect  to  the  true  and  spurious  solutions. 

COROLLARY  2,  The  orientations  of  true  and  spurious  W  are  symmetric  with 
respect  to  the  principal  axes  of  the  flow,  and  so  are  the  orientations  of 
true  and  spurious  P. 

This  statement  also  has  an  invariant  meaning. 

EXAMPLE  1.  Consider  the  flow  of  Fig.  8.  The  flow  parameters  are 
a  =  0.1,  b  =  0.1,  A  =  0.0873,  B  =  -  0.2269,  C  =  0.0873.  D  =  0.052A, 
and  hence 

T  =  0.1397,  R  =  0.3142,  S  =  0.0349  -  0.1396i. 

Since  abs(S)  =  0.1439,  we  see  abs(T)  <  abs(S)  and  hence  the  flow  can  be 
regarded  as  that  of  rigid  motion.  First,  from  eqn  (5.5),  we  obtain  w3  = 
10,  8  (deg/sec).  The  remaining  components  of  rotation  and  the  gradients 
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are 


W1  =  (0.7061  +  0. 7081i)k  (rad/sec),  PI  =  (0.1233  -  0.1484i)/k, 

W2  =  (0.5157  +  0.8568i)k  (rad/sec),  P2  =  (0.1019  -  0.1016i)/k, 
where  k  is  the  indeterminate  scale  factor.  One  set  of  these  is  the  true 
solution  and  the  other  is  the  spurious  one.  Fig.  9  illustrates  these  when 
k  =  0.5.  There,  the  principal  axes  and  the  directions  of  maximum  shearing 
are  also  indicated. 


7.  ADJACENT  TWO  OPTICAL  FLOWS 

Now,  we  consider  two  regions  of  an  image  which  have  different  optical 
flows,  i.e.,  different  flow  parameters  a,  b.  A,  B,  C,  D.  Obviously,  this 
arises  if  the  object  is  a  polyhedron.  However,  the  object  can  have  a 
smoothly  varying  surface,  in  which  case  we  divide  the  surface  image  into 
small  regions  each  of  which  can  be  regarded  as  almost  planar,  say 
according  to  the  criterion  of  planarity  discussed  in  Section  2. 

Let  a,  b.  A,  B,  C,  D  be  the  parameters  of  one  region  and  a',  b'.  A', 
B',  C',  D'  those  of  the  other,  and  assume  that  these  two  sets  are  not 
identical.  If  the  two  regions  are  planar  and  adjacent  to  each  other, 
their  intersection  must  be  a  straight  line,  on  which  u,  v  must  be 
continuous,  i.e.. 


[a]  +  [ A  jx  +  [B]y  =  0, 


[b]  +  [C]x  +  [D]y  =  0, 


(7.1) 


where  [  ]  designates  the  difference,  e.g.,  [a]  =  a'  -  a.  The  necessary 
and  sufficient  condition  that  eqns  (7.1)  define  one  and  the  same  line  is 

[aj  :  [b]  =  I A j  :  fC]  =  [B]  s  [D] .  (7.2) 
Eqn  (7.2)  gives  a  "criterion  of  adjacency."  In  other  words,  if  eqn  (7.2) 
is  not  satisfied  (within  a  certain  error),  the  two  regions  cannot  be 
regarded  as  being  adjacent  to  each  other.  If  eqn  (7.2)  is  satisfied, 
eqns  (7.1)  define  the  "intersection  line."  If  the  two  regions  are  two 
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adjacent  faces  of  a  polyhedron,  the  intersection  line  may  be  directly 
observed.  However,  even  when  intersection  lines  are  missing  due  to  noise 
or  some  technical  difficulties,  we  can  still  recover  them  once  we  can 
successfully  estimate  the  flow  parameters  on  each  region,  say  by  eqns 
(2,6).  Moreover,  even  when  the  two  regions  are  neighboring  "almost 
planar"  patches  of  a  smoothly  varying  surface,  the  "hypothetical 
intersection  line"  is  still  defined. 

Next,  we  must  check  if  the  two  adjacent  planar  regions  are  "rigidly" 
connected.  Obviously,  a  "criterion  of  rigid  adjacency"  is  given  by 
testing  if  we  can  determine  common  motion  parameters  a,  b,  wl,  w2,  w3. 
(If  the  computation  is  done  with  respect  to  different  coordinate  systems 
for  the  two  regions,  we  must  compare  them  after  appropriately  transforming 
them  as  is  discussed  in  Section  3.) 

Suppose  two  regions  are  images  of  two  planes  z  =  px  +  qy  +  r  and  z  = 
p'x  +  q'y  +  r' .  Since  z  =  px  +  qy  +  r  =  p’x  +  q'y  +  r'  on  the 
intersection  line,  its  equation  on  the  image  plane  becomes 

[p]x  +  fq]y  +  [r]  =  0.  (7.3) 

Hence,  we  see  that 

LEMMA  2.  Vector  fP]  is  perpendicular  to  the  intersection  line. 

According  to  Section  5,  we  can  first  compute  w3  for  the  two  regions 
separately,  ending  up  with  two  solutions  for  each  region,  the  true  and  the 
spurious  one.  If  the  two  regions  are  rigidly  connected,  the  true  one  must 
be  common  to  them,  and  we  can  pick  up  the  common  one  as  the  true  w3.  If 
both  the  true  and  spurious  solutions  are  common,  we  have  pwl  +  qw2  =  p'wl 
+  q'w2  according  to  eqn  (5.10).  This  means  [p]wl  +  fq]w2  =  0  and  hence 
1 P ]  is  perpendicular  to  W  =  wl  +  iw2 .  Since  the  intersection  line  is 
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always  perpendicular  to  [P],  W  must  be  parallel  to  the  intersection  line. 
As  was  pointed  in  the  previous  section,  we  can  only  determine  W's 
orientation  as  an  undirected  axis,  and  hence  we  can  say  that  W  is 
determined.  Then,  we  can  pick  up  the  correct  value  of  w3  so  that  eqn 
(6.1)  or  (6.2)  is  satisfied.  Once  we  have  determined  w3  uniquely,  we  can 
compute  W  for  each  of  the  regions.  If  the  two  orientations  of  W  do  not 
coincide  (within  a  certain  error),  the  two  regions  cannot  be  regarded  as 
being  rigidly  connected.  If  they  coincide,  the  scale  factor  k  can  be 
taken  to  be  common  to  both. 

Now,  if  y  =  mx  +  n  is  the  computed  intersection  line,  we  must  have,  in 
comparison  with  eqn  (7.3),  that  [p]  :  [q]  :  [r]  =  m  :  -  1  :  n.  Hence, 


LEMMA  3.  If  the  intersection  line  is  y  =  mx  +  n,  the  equations  of  the 


planes  are 


z  =  px  +  qy  +  r, 


=  p'x  +  q'y  +  (r  -  [q]n) 


This  can  be  also  extend  to  other  regions.  Hence,  we  have  established  the 


following  fact. 


THEOREM  2.  The  structure  and  motion  of  an  object  are  determined  from  its 
optical  flow  under  orthographic  projection  only  up  to  an  unknown  absolute 
depth  r  and  an  indeterminate  scale  factor  k  aside  from  the  translation  in 
the  z-direction. 


We  have  also  given  explicit  formulae  of  computation. 


EXAMPLE  2.  Consider  the  flow  of  Fig.  10.  According  to  the  discussion  in 
Section  2,  we  cannot  conclude  that  this  is  a  flow  induced  bv  a  motion  of  a 


single  plane  but  that  this  consists  of  two  different  flows  with  flow 
parameters 

a  =  -  0.1,  b  =  0.2,  a’  =  -  0.1489,  b'  =  0.2244, 

A  =  0.2094,  B  =  -  0.1047,  C  =  0.0698,  D  =  -  0.0349 

A'  =  -  0.1396,  B'  =  -  0.3490,  C'  =  0.2443,  D'  =  0.0873. 

Eqn  (7.2)  is  satisfied  within  rounding  error,  and  the  intersection  of  the 
plane  surfaces  must  be  y  =  -  1.4286x  -  0.2  which  is  indicated  in  the 

figure.  From  the  former  flow  (upper  right)  we  obtain  w3  =  10,  0  (deg/sec) 
and  from  the  latter  (lower  left)  w3  =  24,  10  (deg/sec).  Hence,  we 

conclude  that  w3  =  10  (deg/sec),  and  the  remaining  rotation  components 
become 

W  =  (0.4472  +  0.8944i)k  (rad/sec), 

where  k  is  the  indeterminate  scale  factor.  The  gradients  are  given  by 
P  =  (0.2341  +  0. 0780i)/k,  P'  =  (-  0.1561  -  0. 19511) /k. 

respectively.  They  are  indicated  in  Fig.  11  when  k  =  0.5.  Note  that  [P] 
is  always  perpendicular  to  the  intersection  line.  The  equations  of  the 
two  planes  are 

z  =  (  0. 2 34  lx  +  0.0780y)/k  +  r, 
z  =  (-  0.1561x  -  0. 1951 y ) /k  +  (r  -  0.0546/k). 

8.  CONCLUDING  REMARKS 

In  this  paper,  we  have  exhausted  all  that  can  be  known  given  an  optical 
flow  under  orthographic  projection  of  a  rigid  object  moving  in  3D 

space.  First,  the  image  is  divided  into  small  regions  which  are  either 
planar  faces  or  almost-planar  patches  of  a  smoothly  varving  surface.  A 
criterion  for  it  was  also  discussed.  Analyzing  each  of  these  regions,  we 
have  reached  a  conclusion  that  the  motion  can  be  recovered  up  to  a  common 


absolute  depth  and  a  common  scale  factor.  We  also  presented 


explicit 


formulae  of  computation.  This  conclusion  was  partly  pointed  out  by 

Sugihara  and  Sugie  [15],  who  took  a  correspondence-based  approach  and 
considered  a  finite  number  of  rigidly  moving  points.  They  proved  that  one 
scale  factor  must  be  involved  in  addition  to  the  indeterminate  scale 
factor,  but  they  failed  to  show  that  the  solution  is  unique  once  the 
scale  factor  and  the  absolute  depth  are  given.  Moreover,  their  algorithm 
is  not  perfect  and  it  may  produce  physically  impossible  solutions. 

On  the  other  hand  the  correspondence-based  approach  can  be  incorporated 
in  our  flow-based  approach.  Consider  the  case  where  each  almost-planar 
patch  consists  of  only  three  points,  which  amounts  to  polyhedral 
approximation  of  the  object.  Then,  observing  the  velocities  of  three 
points  is  equivalent  to  observing  the  optical  flow  of  the  plane  spanned  by 
these  three  points.  For  example,  suppose  we  measured  velocities  (u,  v), 

(u',  v')  and  (u",  v")  at  three  points  (x,  y) ,  (x',  y')  and  (x",  y"), 

respectively.  Then,  the  flow  parameters  a,  b,  A,  B,  C,  D  are  determined 
by  solving  simultaneous  equations 


a 

>  A 


y  J  L  B 


1  x'  y' 


b  i  r 


These  give  a  unique  solution  for  a,  b.  A,  B,  C,  D  unless  the  determinant 


(8.3) 


vanishes,  i.e.,  which  is  a  condition  for  collinearity  of  the  three  points. 
Hence,  if  velocities  at  three  non-collinear  points  are  observed,  the  flow 
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parameters  are  determined. 


EXAMPLE  3.  Suppose  we  measured  velocities  at 


r  =  (0.6,  0.2),  r'  =  (-  0.2,  -  0.4),  r"  =  (-  0.4,  0.8), 
and  observed 

u  =  (-  0.0416,  0.1052),  u'  =  (-  0.0975,  0.1767),  u"  =  (0.077,  0.1593), 
respectively  (Fig.  12).  From  eqns  (8.1)  and  (8.3), 


a  =  -  0.0486, 


b  =  0.1523. 


A  =  -  0.0349,  B  =  0.1396,  C  =  -  0.0698,  D  =  -  0.0262, 


T  =  -  0.0611,  R  =  -  0.2094, 


0.0087  +  0. 0698i . 


The  corresponding  flow  is  shown  in  Fig.  13.  By  the  procedure  we  showed 
before,  we  see  that  the  two  solutions  are 

w3  =  -  5  (deg/sec),  W  =  (0.4477  +  0.8942i)k  (rad/sec), 

P  =  (-  0.0390  +  0.0585i)/k, 

w3  =  -  5  (deg/sec),  W  =  (0.8319  +  0.5549i)k  (rad/sec). 


P  =  (-  0.0629  +  0.03151) /k. 


Hence,  the  equation  of  the  plane  is 

z  =  -  0. 0390x/k  +  0. 0585y/k  +  r  or  z  =  -  0.0629x/k  +  0.0315y/k  +  r. 
Therefore,  the  z-coordinates  of  the  three  points  are 

z  =  -  0.0117/k  +  r,  z'  =  -  0.0156/k  +  r,  z"  =  0.0624/k  +  r 


or  z  =  -  0.0314/k  +  r, 


z'  =  r. 


z"  =  0. 0504/ k  +  r. 


where  k  is  the  common  scale  factor  and  r  is  the  common  absolute  depth. 


Thus,  our  flow-based  approach  is  a  generalization  of  the  correspondence- 
based  approach,  including  it  as  a  special  case.  In  practice,  however, 
observed  velocities  of  a  small  number  of  points  are  unreliable  due  to 
noise  and  misdetection  of  point  correspondence,  as  was  pointed  out 
earlier.  Hence,  it  seems  a  wise  policy  to  base  the  whole  computation  on 


the  flow  parameters  a,  b.  A,  B,  C,  D  obtained  by  the  process  of  taking 
sums  or  averages  of  a  number  of  data,  which  is  less  sensitive  to  local 
errors  in  general.  Therefore,  our  present  formulation  seems  preferable 
for  actual  processing.  Moreover,  since  our  flow-based  approach  starts 
with  the  flow  parameters,  we  do  not  necessarily  have  the  optical  flow  or 
detect  point  correspondence.  For  example,  if  we  use  the  methods  of 

Kanatani  [7  -  9],  the  flow  parameters  are  determined  directly  without 
knowing  correspondence.  As  we  have  seen,  indeterminacy  is  involved  in  one 
optical  flow.  However,  the  indeterminacy  is  reduced  if  a  sequence  of 
successive  optical  flows  of  the  same  object,  because  p,  q,  r,  a,  b,  c  (the 
velocity  along  the  z-axis  if  not  zero),  wl,  w2  and  w3  cannot  evolve 
arbitrarily.  Namely,  we  have  the  following  "compatibility  conditions" 

dp/dt  =  pqwl  -  (p^  +  l)w2  -  qw3,  (8.4) 

dq/dt  =  (q^  +  l)wl  -  pqw2  +  pw3.  (8.5) 

dr/ft  *  c  -  pa  -  qb.  (8.6) 

Taking  a  flow-based  approach  rather  than  the  correspondence-based 

approach  of  Sugihara  and  Sugie  [15]  has  also  led  to  various  other  useful 
concepts  and  interpretations.  Our  flow-based  analysis  enabled  us  to  study 
the  transformation  properties  under  coordinate  changes  and  to  express  the 
quantities,  formulations  and  interpretations  in  frame  indifferent  manners. 
The  concept  of  invariance  is  important  not  only  for  consistent  and  elegant 
descriptions  but  also  for  practical  applications,  because  it  allows  us  to 
choose  a  specific  coordinate  system  suitable  to  each  different  region. 
Furthermore,  the  concept  of  invariance  has  naturally  lead  to  hydrodynamic 
analogies  which  make  clear  the  intuitive  meanings  of  our  interpretations. 
Taking  full  advantage  of  our  invariant  approach,  we  expressed  the  solution 
in  explicit  forms,  while  Sugihara  and  Sugie  r 1 5 ]  gave  a  scheme  only  to 


compute  numerically.  In  the  course  of  our  analysis,  we  showed  the 


existence  of  the  spurious  solution  and  gave  its  geometrical  interpretaion. 
We  also  showed  that  it  disappears  if  flows  of  two  different  regions  of  the 
same  object  are  observed. 
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FIGURE  CAPTIONS 

Fig.  1  A  plane  of  equation  z  =  px  +  qy  +  r  is  moving  with  translation 

velocity  (a,  b,  0)  at  (0,  0,  r)  and  rotation  velocity  (wl ,  w2, 
w3)  around  it.  An  optical  flow  is  induced  on  the  xy-plane  by 
orhtographic  projection  along  the  z-axis  with  (0,  0,  -  f)  as  the 
viewpoint . 

Fig.  2  Interpretation  must  be  "invariant"  with  respect  to  coordinate 

changes,  i.e.,  it  must  "commute"  with  coordinate  changes. 

Fig.  3  A  new  x ' y ' -coord ina te  system  is  taken  by  rotating  the  xy- 


coordinate  system  by  t  cunterc lockwise  and  translating  it  by  (hi, 
h 2 ) .  The  new  origin  O'  is  at  (hi,  h2)  and  the  new  x'-axis  makes 


Fig.  4  Divergent  flow. 

Fig.  5  Rotational  flow. 

Fig.  6  Pure  shear  flow  with  two  principal  axes  Q1  (maximum  extension) 
and  Q2  (maximum  compression). 

Fig.  7  The  canonical  form  of  pure  shear  flow.  The  principal  axes 
coincide  with  the  coordinate  axes. 

Fig.  8  An  example  of  optical  flow.  The  flow  parameters  are  a  *  0.1,  b  = 
0.1,  A  -  0.0873,  B  -  -  0.2269,  C  -  0.0873,  D  -  0.0524. 

Fig.  9  The  result  of  our  analysis  of  the  flow  of  Fig.  8.  Two  solutions 

are  possible,  the  true  one  and  the  spurious  one.  The  principal 

axes  and  the  direction  of  maximum  shearing  are  also  indicated. 

Fig.  10  Another  example  of  optical  flow.  This  flow  cannot  be  regarded  as 
that  of  a  single  plane.  It  consists  of  two  planar  regions.  The 
dashed  line  is  the  intersection  line  computed  from  the  flow. 

Fig.  11  The  result  of  our  analysis  of  the  flow  of  Fig.  10.  The  spurious 
solution  does  not  appear. 

Fig.  12  Observation  of  three  moving  points  on  the  image  plane.  The 
optical  flow  of  the  plane  spanned  by  these  three  points  are 
uniquely  determined  unless  the  three  points  are  colinear. 

Fig.  13  The  computed  hypothetical  optical  flow  associated  with  the  motion 


of  Fig.  12. 
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